Explain Images with Multimodal Recurrent Neural Networks
نویسندگان
چکیده
In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models the probability distribution of generating a word given previous words and the image. Image descriptions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on three benchmark datasets (IAPR TC-12 [8], Flickr 8K [28], and Flickr 30K [13]). Our model outperforms the state-of-the-art generative method. In addition, the m-RNN model can be applied to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval.
منابع مشابه
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated according to this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional networ...
متن کاملRobust stability of stochastic fuzzy impulsive recurrent neural networks with\ time-varying delays
In this paper, global robust stability of stochastic impulsive recurrent neural networks with time-varyingdelays which are represented by the Takagi-Sugeno (T-S) fuzzy models is considered. A novel Linear Matrix Inequality (LMI)-based stability criterion is obtained by using Lyapunov functional theory to guarantee the asymptotic stability of uncertain fuzzy stochastic impulsive recurrent neural...
متن کاملWMT 2016 Multimodal Translation System Description based on Bidirectional Recurrent Neural Networks with Double-Embeddings
Bidirectional Recurrent Neural Networks (BiRNNs) haveshown outstanding results on sequence-to-sequence learning tasks. This architecture becomes specially interesting for multimodal machine translation task, since BiRNNs can deal with images and text. On most translation systems the same word embedding is fed to both BiRNN units. In this paper, we present several experiments to enhance a baseli...
متن کاملImage Caption Generation with Recursive Neural Networks
The ability to recognize image features and generate accurate, syntactically reasonable text descriptions is important for many tasks in computer vision. Auto-captioning could, for example, be used to provide descriptions of website content, or to generate frame-by-frame descriptions of video for the vision-impaired. In this project, a multimodal architecture for generating image captions is ex...
متن کاملImage Backlight Compensation Using Recurrent Functional Neural Fuzzy Networks Based on Modified Differential Evolution
In this study, an image backlight compensation method using adaptive luminance modification is proposed for efficiently obtaining clear images.The proposed method combines the fuzzy C-means clustering method, a recurrent functional neural fuzzy network (RFNFN), and a modified differential evolution.The proposed RFNFN is based on the two backlight factors that can accurately detect the compensat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1410.1090 شماره
صفحات -
تاریخ انتشار 2014